SPMLS : An Efficient Sequential Pattern Mining Algorithm with candidate Generation and Frequency Testing
نویسنده
چکیده
Sequential pattern mining is a fundamental and essential field of data mining because of its extensive scope of applications spanning from the forecasting the user shopping patterns, and scientific discoveries. The objective is to discover frequently appeared sequential patterns in given set of sequences. Now-a-days, many studies have contributed to the efficiency of sequential pattern mining algorithms. Most existing algorithms have verified to be effective, however, when mining long frequent sequences in database, these algorithms do not work well. In this paper, we propose an efficient pattern mining algorithm, SPMLS, Sequential Pattern Mining on Long Sequences for mining long sequential patterns in a given database. SPMLS takes up an iterative process of candidate-generation which is followed by frequency-testing in two phases, event-wise and sequence-wise. Event-wise phase presents a new candidate pruning approach which improves the efficiency of the mining process. Sequence-wise phase integrates considerations of intra-event and inter-event constraints. Simulations are carried out on both synthetic and real datasets to evaluate the performance of SPMLS.
منابع مشابه
Mining Constraint-based Multidimensional Frequent Sequential Pattern in Web Logs
In this paper we introduce an efficient strategy for discovering Web usage mining is the application of data mining techniques to discover usage patterns from Web data, in order to understand and better serve the needs of Web-based applications. Web usage mining consists of three phases, namely preprocessing, pattern discovery, and pattern analysis. This paper describes each of these phases in ...
متن کاملMulti-Level Weighted Sequential Pattern Mining Based on Prime Encoding
Encoding can express the hierarchical relationship in the area of mining the multi-level sequential pattern, up to now all the algorithms of which find frequent sequences just according to frequency, but items have different importance in the real applications, therefore the weight constraint involved to the entire mining process is crucial. The MWSP algorithm based on the candidate generation-...
متن کاملCAMLS: A Constraint-Based Apriori Algorithm for Mining Long Sequences
Mining sequential patterns is a key objective in the field of data mining due to its wide range of applications. Given a database of sequences, the challenge is to identify patterns which appear frequently in different sequences. Well known algorithms have proved to be efficient, however these algorithms do not perform well when mining databases that have long frequent sequences. We present CAM...
متن کاملSequential Pattern Mining by Pattern-Growth: Principles and Extensions
Sequential pattern mining is an important data mining problem with broad applications. However, it is also a challenging problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Recent studies have developed two major classes of sequential pattern mining methods: (1) a candidate generation-and-test approach, represented by (i) GSP...
متن کاملShrFP-Tree: An Efficient Tree Structure for Mining Share-Frequent Patterns
Share-frequent pattern mining discovers more useful and realistic knowledge from database compared to the traditional frequent pattern mining by considering the non-binary frequency values of items in transactions. Therefore, recently share-frequent pattern mining problem becomes a very important research issue in data mining and knowledge discovery. Existing algorithms of share-frequent patter...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012